Video quality measurement

ABSTRACT

A measure of quality of compressed video signals is obtained without reference to the original uncompressed version, but generated directly from the coded image parameters, thereby avoiding the need to decode the compressed signal. A first measure is generated from the quantizer step size and a second measure is generated as a function of the number of blocks in the picture that have only one transform coefficient. The two measures are combined. Adjustments may be made to the step-size based measure to compensate for spatial or temporal masking effects.

This application is the US national phase of international application

PCT/GB2003/005002 filed 18 Nov. 2003 which designated the U.S. andclaims benefit of GB 0228556.7, dated 6 Dec. 2002, the entire content ofwhich is hereby incorporated by reference.

BACKGROUND

1. Technical Field

This invention is concerned with video quality measurement, and moreparticularly with the assessment of picture quality without reference toa copy of the original undistorted pictures.

2. Related Art

While others have provided various types of video quality measurements(e.g., see U.S. Pat. No. 6,810,083 —Chen, et al.), such sometimesrequire excess resources and/or provide less than optimum meaningfulresults.

BRIEF SUMMARY

According to one aspect of the present invention there is provided amethod of generating a measure of quality for a video signal that hasbeen encoded using a compression algorithm utilising a variablequantiser step size and a two-dimensional transform, such that theencoded signal includes a quantiser step size parameter and, for blocksof the picture, transform coefficients, the method comprising:

-   -   a) generating a first quality measure which is a function of        said quantiser step size parameter;    -   b) generating a second quality measure which is a function of        the number of blocks having a single transform coefficient; and    -   c) combining the first and second measures.

In another aspect, the invention provides an apparatus for generating ameasure of quality for a video signal that has been encoded using acompression algorithm utilising a variable quantiser step size and atwo-dimensional transform, such that the encoded signal includes aquantiser step size parameter and, for blocks of the picture, transformcoefficients, the apparatus comprising:

-   -   a) means for generating a first quality measure which is a        function of said quantiser step size parameter;    -   b) means for generating a second quality measure which is a        function of the number of blocks having a single transform        coefficient; and    -   c) means for combining the first and second measures.

Other aspects of the invention are defined in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments of the invention will now be described, by way ofexample, with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of one example of an apparatus for measuringthe quality of a received video signal;

FIG. 2 illustrates graphically characteristics of quantisationdistortion;

FIGS. 3 to 6 are graphs illustrating test results;

FIG. 7 illustrates graphically a coefficient amplitude distribution; and

FIGS. 8 to 14 are graphs illustrating further test results.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

In principle the measurement process used is applicable generally tovideo signals that have been encoded using compression techniques usingtransform coding and having a variable quantiser step size. The versionto be described however is designed for use with signals encoded inaccordance with the MPEG-2 standard. (Although the version to bedescribed is based on MPEG-2 video codec, it also applies the other DCTbased standard codecs, such as H.261, H.263, MPEG-4 (frame based) etc.)

The measurement method is of the non-intrusive or “no-reference”type—that is, it does not need to have access to a copy of the originalsignal. Moreover it aims to perform its measurements without the need todecode the received signals into video signals; rather, it utilises theparameters contained in the received MPEG stream.

In the apparatus shown in FIG. 1, the incoming signal is received at aninput 1 and passes to a VLC decoder and data parser 2 which decodes thevariable-length codes and outputs the following parameters:

-   (a) for each picture:-   Picture type PT (=I, P or B)-   (b) for each macroblock (MB) into which the picture is divided:-   Macroblock type MT (e.g. INTRA or INTER, skipped, not-coded, etc.)    Quantiser step size Q-   (c) for each block within the macroblock:-   Number of coefficients N_(C)-   Coefficients C-   Motion vectors MV.

There are two analysis paths in the apparatus, which serve to estimatethe peak signal-to-noise ratio (PSNR) of the signal and the “blockiness”of the signal, respectively. The elements 3 to 14 could be implementedby individual hardware elements but a more convenient implementation isto perform all those stages using a suitably programmed processor.

PSNR Estimation

This employs the quantiser step sizes Q. After some adjustments (to bedescribed later) the adjusted step size Q_(M) is used at 3 to calculatethe estimated PSNR in accordance with

${PSNR} = {{10\mspace{14mu}{\log\left\lbrack \frac{255^{2}}{\left( \frac{Q_{M}^{2}}{12} \right)} \right\rbrack}} = {59 - {20\mspace{14mu}\log\mspace{14mu} Q_{M}}}}$

The derivation of this equation, and of a more sophisticatedalternative, will be given below.

The adjustments referred to consist of three stages:

Stage 4: The quantiser step sizes for B-pictures are adjusted bydivision by 1.4.

Stage 5: The quantiser step-sizes are adjusted to take into accountspatial masking effects, employing a spatial complexity factor X,generated by counting the number of nonzero coefficients N_(C) in eachblock and assigning values of X (Stage 6) as follows.N _(c)<3: X=13≦N _(C)<6 X=1.26≦N _(C)<10 X=1.410≦N _(C)<15 X=1.615≦N _(C) X=1.8

An adjusted value Q_(X) is then (at 5) computed as

$Q_{X} = {\sum\limits_{frame}\;\frac{Q}{X}}$

Note also that at this point the opportunity has been taken to sum thesevalues over the whole frame.

Next, there is a further adjustment to take account of temporal masking.The motion vectors are used (Stage 8) to derive the motion magnitude V(adjusted for block type—see below for details) averaged over the wholeframe.

The adjusted Q is then (Stage 9)

$Q_{M} = \frac{Q_{X}}{\log_{10}\left( {10 + V} \right)}$

Note that the formula given above for the PSNR calculation assumes auniform probability distribution function for the quantiser step size.Whilst this is a reasonable assumption for the DC coefficients of anINTRA block it is less accurate for other coefficients. An alternativeformula for a non-uniform probability distribution is derived in thediscussion below (Equation (7)): if this is used then preferably theparameter a of the pdf is switched according to the type of frame (I, Por B). Note that (on the basis that the DC coefficients of INTRA blocksare in the minority) we would prefer to use Equation 7 for allcoefficients, and this is found to give good results in practice. If onewished to use both distributions one would need to avoid summing overthe whole frame at stage 5 and use a formula for PSNR which would thencater for combining Q values with different pdfs.

Blockiness Estimation

The simplest form of this is to make use of the fact that, within anINTRA-coded macroblock, a block that has only one coefficient will causea blocky appearance in the picture, and calculate the percentage ofblocks within the frame that meet this criterion.

Thus at stage 10 each block is marked b=1 (blocky) or b=0 (not blocky),and then (Stage 11) the number N_(DC) of such blocks is counted, as isthe total number N_(T) of coded blocks, and the quotient B=N_(DC)/N_(T)calculated.

An enhanced blockiness measure might also take account of the fact thatan INTER-coded block for which all coefficients are zero also results inblockiness if the earlier picture (which a decoder will copy from) isblocky at that position. Thus, Stage 10 may also receive the previousvalue of b via a delay (the duration of which corresponds to the delaybetween the earlier frame and the one encoded, which will vary, forexample for MPEG with two B-frames for each P-frame it would be threeframe periods).

In this case the formula for stage 10 becomes:If (MT=INTRA and N=1) or (MT=INTER and N=0 and b ⁻¹=1) THEN b=1;Otherwise b=0.(where b⁻¹ is the previous-frame value of b for the same blockposition).

Combination

In order to combine the PSNR and blockiness measures it is firstlynecessary to convert these measures into measurements on the same scale(Stages 12,13). A convenient way to do this is to translate the measuresinto measures M_(Q) and M_(B) on an arbitrary scale from, say, 0 to 9(0=very poor and 9=very good), using a conversion table as used for thecomplexity measure X, either by dividing the possible range of PSNR or Binto equal steps, or alternatively with an empirical table based onviewing tests.

Having obtained the two measures M_(Q) and M_(B) on the same scale, theyare combined into a single measure. The basis on which this is done isthat in the event of strong blockiness for a particular frame, then theoutput measure M from Stage 14 is simply the blockiness measure M_(B).If on the other hand the blockiness measure is low, then the outputmeasure M_(O) is the PSNR measure M_(Q).

e.g.If M _(B)>6 M _(O) =M _(Q)If M _(B)≦6 M _(O) =M _(B)

Discussion

The following discussion explains the rationale behind theabove-described method, gives derivations of the equations and someexperimental results, and describes some modifications and improvements.

The aim of this work was to demonstrate how the required data from thecompressed bitstream can be extracted and used as a video quality metric(VQM), without a reference. It is shown that the most importantparameter for VQM is the quantiser step size, which can be extracted permacroblock (MB). To include the human visual system, this value ismodified with the spatio-temporal image content. Here the spatialcontent is derived from the number of AC coefficients of the coded MBs,and the temporal activity from the motion vectors. To consider thecontrast sensitivity, in the picture, the DC coefficients of the INTRAcoded blocks may also be used. Finally, in cases of severe distortion,picture blockiness, which is the dominant distortion, can be extracted.This is done by taking the percentage of INTRA coded blocks with DC asthe only non-zero coefficient in the block, over the entire codedblocks. This value is modulated with the number of skipped andsnon-coded MBs, to improve reliability of detection. These are allextracted from the compressed bitstream.

Currently an objective measure of video quality is carried out underthree main categories: Full-Reference, Reduced-Reference andNo-Reference models. It is believed that the No-Reference model hasgreat potential for being simpler than the other two and is most likelyto be used for quality assessment or monitoring throughout networks,where access to the reference data (either full or reduced) is costly ornot possible. Thus it is of vital importance to see how the processingcomplexity of the No-reference model can be reduced without sacrificingits assessment accuracy.

One way of simplifying the no-reference model complexity is to extractthe model parameters directly from the bitstream, without decoding thepictures fully. In fact, most of the current no-reference models work onthe decoded pictures, to extract the required model parameters. Thus,the degree of relative simplicity, or processing efficiency of these twomethods of no-reference models can be the amount of individualprocessing operations needed under each scheme.

In general, no-reference model parameters include: motion, spatial orcontextual details, edges, contrast sensitivity, etc. Generating each ofthese from the decoded pictures can involve heavy processing. Forexample, estimation of motion vectors at the encoders normally takesabout 60% of the processing power of the encoders (note that encodersmight test a variety of motion estimation modes, while as a measure ofspeed, only one mode is enough, hence the percentage can be less than60%, but still can be high). Considering that video encoders are 3-5times more demanding of processing than the video decoders, then one cansay that just the motion estimation can be 2-3 times more demanding ofprocessing than decoding a picture. Hence derivation of no-referencemodel parameters (e.g. motion, spatial or contextual details, edges,contrast sensitivity, blockiness) on the decoded pictures can be severaltimes (e.g. 5) more demanding of processing than decoding a picture. Theexact ratio depends on how complex the implementation of each of thesemay be.

On the other hand, deriving the model parameters from the bitstream,considering that the required information is already embedded in thebitstream, can be a fraction of the processing required to decode apicture. Here, what is needed is the inverse VLC (very fast, using tablelook ups), which is a very minute fraction of decoding a picture. Hencethe complexity of extracting model parameters from the bitstreamcompared with those derived from the decoded pictures is very negligible(e.g. 5-10% or even lower), and the proposed method can be used foronline monitoring of video quality at a marginal cost.

In the full reference model for picture quality assessment, the encodingdistortion or the Peak Signal-to-Noise Ratio (PSNR) is often used.Although it can be argued that the PSNR does not exactly represent thehuman perception of image quality (distortion), nevertheless it is avery strong indicator. In fact all the known full reference modelssomehow use the difference (distortion) between the original and theprocessed images to derive some other perceptually optimised parameters.

Now if we assume that PSNR can be an indicator of quality, then thequestion is how in the no-reference model without any reference pictureit can be used as a measure of quality (distortion). Looking at the waythe PSNR is defined, it will become clear that this can be done.

In the Full reference model the PSNR is defined as:

${PSNR} = {10\mspace{14mu}{\log\left\lbrack \frac{255^{2}}{ɛ^{2}} \right\rbrack}}$

In this equation ε² is the mean squared error between the original andthe processed picture. If the coding distortion is just due to thequantisation distortion (which is the case in video coding) then thereis a direct relation between the mean squared error, ε², and thequantiser step size Q. For example, for a uniformly distributed signal,with a quantiser step size Q, the mean squared quantisation distortionis as shown in FIG. 2, where (a) shows the distribution of quantisationdistortion and (b) shows the probability density function of thecoefficient within the quantised range.

The average distortion is calculated as:

$ɛ^{2} = {{\frac{1}{Q}{\int_{{- Q}/2}^{Q/2}{x^{2}\ {\mathbb{d}x}}}} = \frac{Q^{2}}{12}}$

Thus the PSNR (in dB) in terms of the quantiser step size Q can bedefined as:

$\begin{matrix}{{PSNR} = {{10\mspace{14mu}{\log\left\lbrack \frac{255^{2}}{\left( \frac{Q^{2}}{12} \right)} \right\rbrack}} = {59 - {20\mspace{14mu}\log\mspace{14mu} Q}}}} & (1)\end{matrix}$

It should be noted that, in a video codec, quantisation is applied tothe transform coefficients, but the measured distortion is between theoriginal and the decoded pixels. However, since the DCT transform is alinear operator, due to the Parseval Theorem, energy in the transformand pixel domains are equal (this can change by a constant factor,depending on the scaling of the transform coefficients in the forwardand inverse transforms).

FIG. 3 compares the measured PSNR versus the calculated PSNR from thequantiser step size, using Equation (1) (i.e. assuming a uniform pdf),for a test sequence “New York” at 2 Mbit/s. As we see, despite the verycrude approximation of the model, the calculated PSNR very closelyapproximates the measured PSNR. However, while the measured PSNR isstable and smooth, the calculated PSNR is oscillating between the valuesof anchor (I, P) pictures and the B-pictures. To analyse this behaviour,FIG. 4 shows a few frames of the sequence, at a finer scale. This Figureshows that for B-pictures the calculated PSNR is less than the measuredvalues, and for P and I pictures is the opposite. The difference getslarger for larger quantiser step sizes (lower bit rates) as shown inFIGS. 5 and 6 for the same sequence at 1.5 Mbit/s.

Of course, one is not expecting that the two methods should give exactlyequal values. However, the reason that the calculated PSNR values forB-pictures are less than those for I- and P-pictures, while theirmeasured values are almost equal, are due two factors. First, theassumption of uniform quantisation distortions for the coefficients isnot correct. The only coefficient that has almost uniform pdf is the DCcoefficient of the INTRA coded blocks. Thus for the AC coefficients ofeven I-pictures, which have non-uniform amplitude distributions,Equation (1) could with advantage be modified with a known non-uniformdensity function, f(x), rather than uniform distribution of 1/Q.

Note that, although the distributions of coefficients are non-uniform,the degree of non-uniformity can vary from picture-type to picture-type.In B-pictures due to efficient motion compensation, the distribution isvery steep, and most of the coefficients are near to zero. This shouldpreferably be taken into account too.

Second, at the encoders, the quantiser step sizes for B-pictures aredeliberately increased for two reasons. First, as we have seen,B-pictures are efficiently motion compensated, so they areusually-small, no matter what the value of quantiser step size. But whenthey are large to be coded, then it does not make any difference if theyare relatively coarsely quantised. The second reason is that sinceB-pictures are not used by the encoder's prediction loop, then, even ifthey are distorted, this distortion does not propagate into thefollowing pictures. This is not the case with the I- and P-pictures,where any savings in bits by their coarse quantisation, have to be paidback later.

Consider now the calculation of estimated PSNR assuming a non-uniformdensity function. The actual mean squared error due to quantisationshould be

$\begin{matrix}{ɛ^{2} = {\int_{{- Q}/2}^{Q/2}{{f(x)}x^{2}\ {\mathbb{d}x}}}} & (2)\end{matrix}$

To derive a closed solution for this integral, we assume thecoefficients have a non-uniform distribution of the type

$\begin{matrix}{{f(x)} = \frac{\beta}{\left. {1 + \alpha} \middle| x \right|}} & (3)\end{matrix}$which is plotted in FIG. 7.

Where α is the rate of decay of the density function and β is aweighting factor, making sure the pdf is normalised to unity. That is

∫_(−Q/2)^(Q/2)f(x) 𝕕x = 1

Thus β can be found in terms of α

$\begin{matrix}{{\int_{0}^{Q/2}{\frac{\beta}{1 + {\alpha\; x}}\ {\mathbb{d}x}}} = {{1\mspace{14mu}{That}\mspace{14mu}{results}\mspace{14mu}{in}\mspace{14mu}\beta} = \frac{\alpha}{2\mspace{14mu}{\ln\left( {1 + \frac{Q\;\alpha}{2}} \right)}}}} & (4)\end{matrix}$

With this non-uniform pdf, the mean squared error is:

$ɛ^{2} = {{{\int_{{- Q}/2}^{0}{\frac{\beta\; x^{2}}{1 - {\alpha\; x}}\ {\mathbb{d}x}}} + {\int_{0}^{Q/2}{\frac{\beta\; x^{2}}{1 + {\alpha\; x}}\ {\mathbb{d}x}}}} = {2{\int_{0}^{Q/2}{\frac{\beta\; x^{2}}{1 + {\alpha\; x}}\ {\mathbb{d}x}}}}}$

The integral can be simplified by letting αx=u, then

$ɛ^{2} = {\frac{2\beta}{\alpha^{3}}{\int{\frac{u^{2}}{1 + u}{\mathbb{d}u}}}}$after simple manipulations and integration, it is:

$\begin{matrix}\left. {ɛ^{2} = {{\frac{2\beta}{\alpha^{3}}\left\lbrack {\frac{1}{2}\left( {u - 1} \right)^{2}} \right)} + {\ln\left( {u + 1} \right)}}} \right\rbrack_{0}^{Q\;{\alpha/2}} & (5)\end{matrix}$substituting the integral limits and the value of β from Equation (4),the value of the mean squared distortion is

$\begin{matrix}{ɛ^{2} = {\frac{1}{\alpha^{2}{\ln\left( {1 + \frac{Q\;\alpha}{2}} \right)}}\left\lbrack {\frac{Q^{2}\alpha^{2}}{8} - \frac{Q\;\alpha}{2} + {\ln\left( {1 + \frac{Q\;\alpha}{2}} \right)}} \right\rbrack}} & (6)\end{matrix}$and the PSNR with this distortion is:

$\begin{matrix}{{PSNR} = {48.13 - {10\mspace{14mu}\log\left\{ {\frac{1}{\alpha^{2}{\ln\left( {1 + \frac{Q\;\alpha}{2}} \right)}}\left\lbrack {\frac{Q^{2}\alpha^{2}}{8} - \frac{Q\;\alpha}{2} + {\ln\left( {1 + \frac{Q\;\alpha}{2}} \right)}} \right\rbrack} \right\}}}} & (7)\end{matrix}$

FIG. 8 compares this new PSNR with the measured value as well as thePSNR with the uniform density function, for the New York sequence at 1.5Mbit/s. In this figure it is assumed α=1 for all picture types, which isnot an ideal choice, but for the sake of simplicity we have chosen this.In reality α should be chosen differently for different picture types,and for each type to fit the measured PSNR. It should also be larger forlower bit rates (larger expected quantiser step sizes).

FIG. 9 shows the PSNR at a finer scale. As we see, despite the good fit,there are still some oscillations. That is the PSNR of the calculatedB-pictures show some dips compared with the P and I-pictures.

This oscillation is due to the fact that encoders choose largerquantiser step sizes for B pictures compared with those of I and Ppictures. This is implemented into the bit-rate allocation algorithm ofthe encoder through the complexity index, to assign fewer bits toB-pictures by a factor of 1.4. Hence since the assigned bits are reducedby 1.4, then the quantiser step size for B-pictures, Q_(B), are raisedby this factor. Therefore, in the PSNR calculation, we divide them by1.4, as already described for Step 4 in FIG. 1 (note if Q_(B) hasreached its saturation value of 112, we should not divide, since at thispoor picture quality, quantiser step sizes of I and P pictures mighthave also been saturated).

FIG. 10 compares the PSNR of a non-uniform distribution, with themodified B-picture quantiser step sizes, against the measured value. Aswe see, the PSNR is now much smoother, and its variation at a finerscale is shown in FIG. 11.

Thus, so far we have shown that the quantiser step size can act in asimilar manner to the mean squared error (PSNR) in the reference model.Since it is believed that the mean squared distortion should take intoaccount the human visual system response to be a more reliable qualitymetric, so should the No-reference model. In the following we show howthe required parameters for the No-reference model can be extracted fromthe compressed bitstream.

Viewers tolerate more distortions at the detailed areas of the picturesor near the edges. This is known as spatial masking of the human visualsystem. Thus the impact of the quantiser step size on picture quality atthese areas can be reduced by the amount of spatial details.

Picture details, or spatial complexity can also be derived from thebitstream. This can be done by counting the number of non-zero quantisedcoefficients per coded block in the bitstream. This is because, thehigher the spatial details in a picture, the image energy is moredistributed among the coefficients.

Horizontal, vertical and diagonal edges can also be determined, byinspecting the majority of the quantised non-zero coefficients in eachof these directions.

It should be mentioned that this method is reliable up to a point, butloses its strength at higher quantiser step sizes. However, for videoquality metrics, this is not important, since at higher quantiser stepsizes, pictures become blocky, and the picture blockiness becomes thedominant quality (distortion) indicator, that will be dealt with later.

In the experiments, we have divided the quantiser step size permacroblock, Q, by the spatial complexity factor X, derived from thenumber of non-zero coded coefficients. For example X can be 1, 1.2, 1.4and 1.6 if the number of non-zero coefficients per block in the zigzagscan order are less than 3, 6, 10 or 15 respectively, and 1.8 otherwise(this is an approximation: a more sophisticated approach to findingproper values would take the MB-type into account, since in INTRA MBs,more coefficients per block are coded than the INTER ones). Thus so farthe quality metric for a picture is the average of this modifiedquantiser step size in that picture.

$\begin{matrix}{Q_{x} = {\sum\limits_{frame}\;\frac{Q}{X}}} & (8)\end{matrix}$

Our crude subjective comparison between the visual quality of thepictures and Q_(x) per picture shows a very strong correlation. This wasdone by running a segment of coded video with a pointer pointing to theframe coded (unfortunately it just shows I-pictures), against a graph ofQ_(x) per picture.

Similar to spatial masking, motion of distorted objects might havedifferent appearances, depending on the amount of motion (temporalmasking). The motion vectors in the bitstream can be extracted and usedas a gauge for motion in the picture. We have used the motion magnitude,defined as:V=√{square root over (v_(x) ²+)}v_(y) ²   (9)for each macroblock (MB) and then averaged them over the entire frame,to be used as an indication of amount of motion per picture. In derivingthe motion magnitude from the bitstream, care has to be takenconsidering the picture type. For example, since the motion vectors inP-pictures refer to the previous anchor picture, then their magnitudesshould be divided by 3 (we have used the GoP format of M=3, N=12, thatis the anchor pictures are 3 frames apart and the period of I-picturesis 12 frames). In B-pictures, for the B₁-picture, the forward motionvector is used directly but the magnitude of the backward motion vectorshould be divided by 2. The reverse applies to the B₂-pictures. Ofcourse there would not be any motion vector for I-pictures, but thisdoes not mean that I-pictures do not have any motion.

In the experiment, as a movement indicator for the sequence, we justused the motion magnitude derived from the P-pictures. This works well,since changes in motion are much slower than the frame rate, and that ofP-pictures can be used for the other frames in the sequence. Note that,at scene cuts, there would not be any motion vector for the P-pictures.Therefore a method of detecting scene cuts from the bitstream is highlydesirable. This will be dealt with below.

Having found the motion value, the question then is of how it can beused to modify the video quality metric. The best way to see how it isused is in the full reference model. In our experiments, we have used asimple model that reduces the impact of distortion according to thestrength of motion. This is not an optimum model, since certaindistortions, like blockiness and distortions at the edges, may be morevisible at higher speeds than say at mid speeds.

In our model, considering the quantiser step size, spatial and temporalmasking, the quality metric QM so far is defined as:

$\begin{matrix}{{QM} = \frac{Q_{x}}{\log_{10}\left( {10 + V} \right)}} & (10)\end{matrix}$where Q_(x) is the spatially modified average quantiser step size perframe and V is the magnitude of motion per frame.

Running a segment of video against the Quality Metric (QM) curve of the“Scorpion” sequence, at 1, 1.5 and 2 Mbit/s, shown in FIG. 12, shows astrong correlation.

Viewers can tolerate larger distortions at the darker and brighter areasof pictures than at mid ranges. This is known as contrast sensitivity ofthe human visual system. Thus the average intensity of the pixels withina frame can be used to modify the quality metric indicator to compensatefor the contrast sensitivity.

The DC coefficient of each INTRA coded block represents the DC or theaverage values of the quantised pixels in that block. Thus extractingthe DC coefficients of an I-picture, where all the blocks are intracoded, can indicate the overall intensity of an I-picture. Of coursethis cannot be applied to P and B-pictures, since they are mainlypredictively coded. However, like motion, since picture darkness doesnot change fast, then that of an I-picture may be sufficient to berepresentative of the whole pictures in that group of Pictures (GoP).

Note that extracting the DC values in I-pictures should be done withsome care. The fact is that in most cases the DC coefficients arespatially predicted from their neighbouring DC values, and hence theyshould be decoded properly. However, this does not mean to inversetransform the coefficients, but adding to their predicted DCcoefficients.

Turning now to the “blockiness” detection, at low bit rates, or largerquantiser step sizes, pictures become blocky. Although the quantiserstep size is responsible for picture blockiness, it cannot be directlyused as a blockiness indicator. For example, while in the plain areas ofthe picture some small quantiser step sizes may show picture blockiness,in the detailed areas even moderately larger quantiser step sizes maynot show any blockiness, albeit some picture details can be lost.

For the compressed bitstream to indicate whether a picture is blocky ornot, one has to know why pictures become blocky.

In general picture blockiness depends on the MB-type. If a macroblock isINTRA coded, then if the DC coefficient is the only non-zero coefficientof this block, such a decoded block appears blocky. This is because, allthe 8×8=64 pixels of the block are reconstructed from a single value(DC) and all will be equal. On the other hand, if more coefficients arecoded, then pixel values within the block will be different from eachother and block does not look blocky. Thus INTRA coded blocks of allpicture types can lead to blockiness, provided the blocks have only DCcoefficients and all the AC coefficients are zero.

Note that INTER coded blocks (of P and B-pictures) with only a DCcoefficient may not cause blockiness. This is because, even though thereconstructed frame difference for that block may have equal values,when added to their prediction, provided the prediction block is notblocky, the reconstructed pixels do not appear blocky. Blockiness mayalso be created in the other MB-types. This is of course picturedependent. For example in P-pictures, a non-coded MB has all thecoefficients set to zero (cbp=0), but the motion vector is non-zero. Inthis case at the decoder all the 16×16 pixels of the MB are copied fromthe previous frame, displaced by a motion vector. Since values anddirections of the motion vectors change from frame to frame, thencopying of these blocks creates some edges around the MB boundaries thatmake the picture look blocky, although this type of blockiness is lessdisturbing than the one described earlier.

In P-pictures the Skipped MBs might also lead to blockiness. In thesepictures, if all of the quantised coefficients in a MB are zero (cbp=0),and also the motion vector is zero, the MB is skipped (except the firstand last MB in a slice, which will be treated as non-coded). The decodertakes no action on these MBs, and hence these parts of the picture arenot updated (direct copy from the previous frame). Thus, if the previouspicture at that position is blocky, it is transferred to the currentpicture.

In B-pictures, the non-coded and skipped MBs are different from those ofP-pictures. In this picture-type a skipped MB not only has a cbp=0, butalso has to have the same motion prediction (the same value of motionand the direction of prediction) of its adjacent MB. Thus the decoded MBof this type copies a 16×16 pixels of the MB from the previous frame,displaced by some motion vector, hence creating blockiness. Thenon-coded MBs in B-pictures are the ones with cbp=0, but the directionof prediction or its motion vector is not the same as its immediateneighbouring MB. Thus non-coded MBs can also lead to blockiness.

It should be noted that, even non-coded blocks (blocks with all thecoefficients zero) of a MB with non-zero coded block pattern (cbp), canlead to blockiness. This is because, for these blocks, pixels from theprevious frame are copied, and if motion vectors point in differentdirections from frame to frame, or when the previous block is itselfblocky, they can appear blocky.

In summary, all types of macroblocks can lead to blockiness. However,skipped and non-coded MBs can cause blockiness if the predicted pictureis blocky, otherwise they do not cause any problem. The only situationthat is certain to cause blockiness is an INTRA coded MB with only onenon-zero coefficient in a block, which has to be the DC one. Thus a morereliable method for blockiness detection is to look for INTRA codedMB-type, and if any of its blocks has only a DC coefficient (all the ACcoefficients are zero), that MB is marked blocky. The percentage ofINTRA coded blocks with only DC coefficients over the total number ofcoded blocks (Inter and intra) is then defined as a measure ofblockiness. To consider the impact of skipped and non-coded MBs onblockiness, the blockiness status of all the MBs in a frame arerecorded, and updated from frame to frame. Hence if at frame N, an MB iseither skipped or non-coded, provided that it was blocky in frame N-1,it is now declared blocky. The blockiness status of the MB changes, whenit is coded, and of course remains blocky if any of its INTRA codedblocks have only DC coefficients. Thus the overall percentage of theblockiness should be recalculated accordingly. However, since we havenot implemented the impact of skipped and non-coded MB on theblockiness, we may consider the sum of INTER and INTRA coded blocks withonly DC coefficients. Including INTER coded blocks, may be to extentcompensate for the lack of skipped and non-coded MB.

FIG. 13 plots the blockiness of the “Scorpion” sequence encoded at 1,1.5 and 2 Mbit/s. In this figure, the impact of the skipped andnon-coded MBs has not been taken into account, but instead the sum ofinter and intra coded blocks with only DC coefficients is considered.Subjectively, the picture is blocky throughout the sequence at 1 Mbit/s.At 2 Mbit/s, there is no blockiness in the picture. At 1.5 Mbit/s,picture is blocky at the middle, and non-blocky in the other parts.These all agree with the blockiness indicator, also shown in FIG. 13.

Note that in this graph the influence of Skipped and Non-coded MBs hasnot been taken into account. They may be considered, since if there aretoo many Skipped and Non-coded MBs (this happens at lower bit rates) inP and B-pictures, then they cannot have too many INTRA coded MBs. Thiswill reduce the reliability of the detector for these pictures. Thelarger the proportion of the Skipped and Non-coded Mb, the less reliableis the detector.

Scene changes (cuts) can affect the subjective quality of pictures, andhence the quality metric model. For example, due to a scene cut, theaverage luminance of the new scene might be very different from theprevious scene. The notion of motion also changes at the scene cuts, andthere are some other effects. For P-pictures, since at scene cuts MBsare mainly Intra coded, this reduces the number of inter coded Mb andhence the motion vectors per P-pictures, that may lead to wrong motionstrength measurements.

The fact is that scene cuts can also be detected from the bitstream.However, the mechanism of detection is picture-type dependent.Considering that from the picture header the picture-type is known, thenthe scene cut for various picture types can be detected in the followingmanner.

In P-pictures, scene cuts are reliably detected by calculating theproportion of INTRA coded MBs. This is because, at scene cuts, framedifference signals can have larger energy than the intra MBs, and hencethe encoder normally codes them in INTRA mode.

In B-pictures, one has to look at the proportion of the forward,backward and interpolated motion vectors. This is because, when a scenecut occurs at a B-picture, then the picture belongs to the new scene andthe majority of the motion vectors will be backward (they point to thefuture anchor picture). More importantly, the number of interpolatedones will be very small, or perhaps nil. Thus a measure of certainty ofdetecting a scene cut would be the ratio of backward to forward motionvectors. Or their percentage within the picture that has to be very highfor the backward, but very small for the forward and specially theinterpolated one.

We can even determine whether the scene cut had occurred in the first orin the second B-picture. This can be done by comparing the ratio of theforward and backward motion vectors of the two B-pictures jointly. Thatis, if both B-pictures have mainly backward motion vectors, this meansthat both B-pictures belong to the new scene. Hence scene cut shouldhave occurred in the first B-picture. On the other hand, if one hasmainly forward (which is the first picture) and the other one mainlybackward, the scene cut should have occurred in the second B-picture.This happens, because the first B-picture belongs to the old scene andthe second B-picture belongs to the new scene.

We can even detect scene cuts at I-pictures. To do this one has to lookat the proportion of the forward/backward motion vectors of its twoprevious B-pictures. If scene cut occurs at an I-picture, then themajority of the motion vectors of its two previous B-pictures will beforward. Thus if both B-pictures have mainly forward motion vector, thenthere is a scene cut in the future anchor picture, that can be a P or anI-picture. This information plus the picture type will determine whetherscene cut had occurred in an I- or a P-picture. The previously mentionedmethod of detecting scene cuts at P-pictures (proportion of INTRA codedMB) can be combined with this to improve the reliability of detectingscene cuts on P-pictures.

Detection of scene cuts from the compressed bitstream is useful for theaccuracy of temporal masking. For example, we know that at scene cutsP-pictures are mainly INTRA coded, and hence there will not besufficient motion vectors in the P-picture bitstream to be used intemporal masking. However, detecting that this is a scene cut, then onemight ignore the extraction of motion vectors for this picture and usethe previous value.

FIG. 14 shows the number of sum of the skipped and non-coded macroblocks of P-pictures (I-pictures do not have any) of the Scorpionsequence at 1, 1.5 and 2 Mbit/s. For the purpose of presentation thisnumber is repeated for the other picture types. Inspection of this graphreveals, first of all, these values on their own cannot determineblockiness, however, their numbers are so significant that can affectthe accuracy of blockiness detector (e.g. depends what percentage ofprevious frame was blocky). Second, as we see at all bit rates, thisnumber is significantly reduced at frame around 218, which is a scenecut. Thus in P-pictures small number of skipped and non-coded MBs can bean indication of a scene cut.

A Video Quality Metric (VQM) can be realised in various forms. One wayis to separate the blockiness which is the main distortion of the blockbased encoders from the other distortions. In the event of strongblockiness, then the video quality (distortion) is solely determined bythe strength of the blockiness. On the other hand, for higher qualityvideo, a perceptual model is used.

The quantiser step size is the main parameter of the perceptual model.It is preferably modified by the spatial, temporal activity and thecontrast sensitivity parameters in the pictures as a quality indicator.

Both the blockiness and perceptual indicators can be derived for eachframe. The resultant values may be used for continuous monitoring ofvideo quality. For a video segment (e.g. 10 sec long), the per framequality indicators might be integrated to represent a single value ofvideo quality. This value can be compared with the subjective testresults, to justify the validity of the model.

1. A machine-implemented method of generating a measure of quality for adigital video signal that has been encoded using a compression algorithmutilizing a variable quantizer step size and a two-dimensionaltransform, such that the encoded signal includes a quantizer step sizeparameter and, for blocks of the picture, transform coefficients, themethod comprising using at least one signal processor having an inputwhich receives said encoded video signal and an output which providesdata representing said measure of quality, said at least one processorbeing configured to: a) generate a first quality measure which is afunction of said quantizer step size parameter; b) generate a secondquality measure which is a function of the number of blocks having asingle transform coefficient; and c) combine the first and secondmeasures to produce said output measure of quality; wherein the secondquality measure is determined as a function of the number of blocksdetermined to be blocky, (i) wherein a block is defined as blocky if theblock has been encoded without reference to an earlier frame of thepicture and the block has only one transform coefficient; and (ii)wherein a block is defined as blocky if it has been encoded by referenceto an earlier frame of the picture and the block has no transformcoefficients and the corresponding block of the respective earlierpicture has also been defined as blocky.
 2. A machine-implemented methodof generating a measure of quality for a digital video signal that hasbeen encoded using a compression algorithm utilizing a variablequantizer step size and a two-dimensional transform, such that theencoded signal includes a quantizer step size parameter and, for blocksof the picture, transform coefficients, the method comprising using atleast one signal processor having an input which receives said encodedvideo signal and an output which provides data representing said measureof quality, said at least one processor being configured to: a) generatea first quality measure which is a function of said quantizer step sizeparameter; b) generate a second quality measure which is a function ofthe number of blocks having a single transform coefficient; c) combinethe first and second measures to produce said output measure of quality;and d) make a spatial masking adjustment to said first measure, saidadjustment being a function of a spatial complexity factor calculated asa function of the number of nonzero transform coefficients per encodedblock.
 3. A machine-implemented method of generating a measure ofquality for a digital video signal that has been encoded using acompression algorithm utilizing a variable quantizer step size and atwo-dimensional transform, such that the encoded signal includes aquantizer step size parameter and, for blocks of the picture, transformcoefficients, the method comprising using at least one signal processorhaving an input which receives said encoded video signal and an outputwhich provides data representing said measure of quality, said at leastone processor being configured to: a) generate a first quality measurewhich is a function of said quantizer step size parameter; b) generate asecond quality measure which is a function of the number of blockshaving a single transform coefficient; and c) combine the first andsecond measures to produce said output measure of quality; wherein themethod is for use with signals encoded using a compression algorithmutilizing motion compensation, such that the encoded signal alsoincludes motion vectors, the method including making a temporal maskingadjustment to said first measure, said adjustment being a function ofthe motion vectors in each encoded block.
 4. A machine-implementedmethod of generating a measure of quality for a digital video signalthat has been encoded using a compression algorithm utilizing a variablequantizer step size and a two-dimensional transform, such that theencoded signal includes a quantizer step size parameter and, for blocksof the picture, transform coefficients, the method comprising using atleast one signal processor having an input which receives said encodedvideo signal and an output which provides data representing said measureof quality, said at least one processor being configured to: a) generatea first quality measure which is a function of said quantizer step sizeparameter; b) generate a second quality measure which is a function ofthe number of blocks having a single transform coefficient: and c)combine the first and second measures to produce said output measure ofquality; wherein the step (c) comprises: a) translating the first andsecond measures to a common scale; b) in the event that the secondmeasure is representative of a picture quality inferior to a thresholdvalue, outputting the second measure; and c) otherwise, outputting thefirst measure.
 5. An apparatus for generating a measure of quality for avideo signal that has been encoded using a compression algorithmutilizing a variable quantizer step size and a two-dimensionaltransform, such that the encoded signal includes a quantizer step sizeparameter and, for blocks of the picture, transform coefficients, theapparatus comprising: a) means for generating a first quality measurewhich is a function of said quantizer step size parameter; b) means forgenerating a second quality measure which is a function of the number ofblocks having a single transform coefficient; and c) means for combiningthe first and second measures; wherein the means for generating thesecond quality measure is arranged to produce a measure that is afunction of the number of blocks determined to be blocky, (i) wherein ablock is defined as blocky if the block has been encoded withoutreference to an earlier frame of the picture and the block has only onetransform coefficient; and (ii) wherein a block is defined as blocky ifit has been encoded by reference to an earlier frame of the picture andthe block has no transform coefficients and the corresponding block ofthe respective earlier picture has also been defined as blocky.
 6. Anapparatus for generating a measure of quality for a video signal thathas been encoded using a compression algorithm utilizing a variablequantizer step size and a two-dimensional transform, such that theencoded signal includes a quantizer step size parameter and, for blocksof the picture, transform coefficients, the apparatus comprising: a)means for generating a first quality measure which is a function of saidquantizer step size parameter; b) means for generating a second qualitymeasure which is a function of the number of blocks having a singletransform coefficient; c) means for combining the first and secondmeasures; d) means for calculating a spatial complexity factor as afunction of the number of nonzero transform coefficients per encodedblock, and e) means operable to make a spatial masking adjustment tosaid first measure, said adjustment being a function of said spatialcomplexity factor.
 7. An apparatus for generating a measure of qualityfor a video signal that has been encoded using a compression algorithmutilizing a variable quantizer step size and a two-dimensionaltransform, such that the encoded signal includes a quantizer step sizeparameter and, for blocks of the picture, transform coefficients, theapparatus comprising: a) means for generating a first quality measurewhich is a function of said quantizer step size parameter: b) means forgenerating a second quality measure which is a function of the number ofblocks having a single transform coefficient; and c) means for combiningthe first and second measures; wherein the apparatus is for use withsignals encoded using a compression algorithm utilizing motioncompensation, such that the encoded signal also includes motion vectors,the apparatus further including: means operable to make a temporalmasking adjustment to said first measure, said adjustment being afunction of the motion vectors in each encoded block.
 8. An apparatusfor generating a measure of quality for a video signal that has beenencoded using a compression algorithm utilizing a variable quantizerstep size and a two-dimensional transform, such that the encoded signalincludes a quantizer step size parameter and, for blocks of the picture,transform coefficients, the apparatus comprising: a) means forgenerating a first quality measure which is a function of said quantizerstep size parameter; b) means for generating a second quality measurewhich is a function of the number of blocks having a single transformcoefficient; and c) means for combining the first and second measures;wherein the means for combining the measures comprises: means fortranslating the first and second measures to a common scale, and meansoperable in the event that the second measure is representative of apicture quality inferior to a threshold value, to output the secondmeasure and, otherwise, to output the first measure.